assumption 6
A Corrections to the main paper 2 2 B Problem setup 3
In the course of preparing the supplementary materials we identified the following two mistakes. For the convenience of the reader we provide the full, corrected table below. C is an appropriatly chosen constant. Frei et al. (2022) Xu & Gu (2023) Theorem 3.1 Theorem 3.6 Theorem 3.8n C log null 1 δ null log null m δ null 1 δ 1 log null m δ null m C 1 log null n δ null log null n δ null log null n δ null log null n δ null γ 1 C 1 n 1 n 1 n 1 nd 1 k γ C 1 nd null log( The same mistake also means that the sentence starting on line 188 "Comparing In order to provide a convenient reference for the reader, we summarize our notation as follows. As such we typically resort to using a generically large enough constant C . For the reader's convenience we recap the data model studied in this work. We assume test data are drawn mutually i.i.d. In regard to the initialization of the network weights, for convenience we assume each neuron's To this end, we introduce the following notation, where p { 1, 1}. P(( B < κT) (T > 0) | w, v > 0) 1 P( T = 0 | w, v > 0) P( B κT | w, v > 0), therefore it suffices to upper bound the two probabilities on the right-hand-side. Using a variant of Hoeffding's bound for sampling without replacement (see Proposition Based on Lemma B.2, the following lemma bounds the probability that " on the counting functions: in particular we write P (i, l) + P (i, l) = P ( i, i) = 1 /2 and hence we conclude p + q = 1 / 2. As a result Observe by the data model, described in Section B.2, that We will often make use of the following similar but more pessimistic bounds on the activations.
A Generalized Adaptive Joint Learning Framework for High-Dimensional Time-Varying Models
In modern biomedical and econometric studies, longitudinal processes are often characterized by complex time-varying associations and abrupt regime shifts that are shared across correlated outcomes. Standard functional data analysis (FDA) methods, which prioritize smoothness, often fail to capture these dynamic structural features, particularly in high-dimensional settings. This article introduces Adaptive Joint Learning (AJL), a hierarchical regularization framework designed to integrate functional variable selection with structural changepoint detection in multivariate time-varying coefficient models. Unlike standard simultaneous estimation approaches, we propose a theoretically grounded two-stage screening-and-refinement procedure. This framework first synergizes adaptive group-wise penalization with sure screening principles to robustly identify active predictors, followed by a refined fused regularization step that effectively borrows strength across multiple outcomes to detect local regime shifts. We provide a rigorous theoretical analysis of the estimator in the ultra-high-dimensional regime (p >> n). Crucially, we establish the sure screening consistency of the first stage, which serves as the foundation for proving that the refined estimator achieves the oracle property-performing as well as if the true active set and changepoint locations were known a priori. A key theoretical contribution is the explicit handling of approximation bias via undersmoothing conditions to ensure valid asymptotic inference. The proposed method is validated through comprehensive simulations and an application to Sleep-EDF data, revealing novel dynamic patterns in physiological states.
- Research Report > New Finding (0.67)
- Research Report > Experimental Study (0.45)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.66)
- Health & Medicine > Therapeutic Area > Sleep (0.46)
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
Sharp Structure-Agnostic Lower Bounds for General Functional Estimation
Jin, Jikai, Syrgkanis, Vasilis
The design of efficient nonparametric estimators has long been a central problem in statistics, machine learning, and decision making. Classical optimal procedures often rely on strong structural assumptions, which can be misspecified in practice and complicate deployment. This limitation has sparked growing interest in structure-agnostic approaches -- methods that debias black-box nuisance estimates without imposing structural priors. Understanding the fundamental limits of these methods is therefore crucial. This paper provides a systematic investigation of the optimal error rates achievable by structure-agnostic estimators. We first show that, for estimating the average treatment effect (ATE), a central parameter in causal inference, doubly robust learning attains optimal structure-agnostic error rates. We then extend our analysis to a general class of functionals that depend on unknown nuisance functions and establish the structure-agnostic optimality of debiased/double machine learning (DML). We distinguish two regimes -- one where double robustness is attainable and one where it is not -- leading to different optimal rates for first-order debiasing, and show that DML is optimal in both regimes. Finally, we instantiate our general lower bounds by deriving explicit optimal rates that recover existing results and extend to additional estimands of interest. Our results provide theoretical validation for widely used first-order debiasing methods and guidance for practitioners seeking optimal approaches in the absence of structural assumptions. This paper generalizes and subsumes the ATE lower bound established in \citet{jin2024structure} by the same authors.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Comparing Two Proxy Methods for Causal Identification
Guo, Helen, Ogburn, Elizabeth L., Shpitser, Ilya
Identifying causal effects in the presence of unmeasured variables is a fundamental challenge in causal inference, for which proxy variable methods have emerged as a powerful solution. We contrast two major approaches in this framework: (1) bridge equation methods, which leverage solutions to integral equations to recover causal targets, and (2) array decomposition methods, which recover latent factors composing counterfactual quantities by exploiting unique determination of eigenspaces. We compare the model restrictions underlying these two approaches and provide insight into implications of the underlying assumptions, clarifying the scope of applicability for each method.
- North America > United States > California > Alameda County > Hayward (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Africa > Senegal > Kolda Region > Kolda (0.04)
Row-stochastic matrices can provably outperform doubly stochastic matrices in decentralized learning
Liu, Bing, Kong, Boao, Lu, Limin, Yuan, Kun, Zhao, Chengcheng
Decentralized learning often involves a weighted global loss with heterogeneous node weights $λ$. We revisit two natural strategies for incorporating these weights: (i) embedding them into the local losses to retain a uniform weight (and thus a doubly stochastic matrix), and (ii) keeping the original losses while employing a $λ$-induced row-stochastic matrix. Although prior work shows that both strategies yield the same expected descent direction for the global loss, it remains unclear whether the Euclidean-space guarantees are tight and what fundamentally differentiates their behaviors. To clarify this, we develop a weighted Hilbert-space framework $L^2(λ;\mathbb{R}^d)$ and obtain convergence rates that are strictly tighter than those from Euclidean analysis. In this geometry, the row-stochastic matrix becomes self-adjoint whereas the doubly stochastic one does not, creating additional penalty terms that amplify consensus error, thereby slowing convergence. Consequently, the difference in convergence arises not only from spectral gaps but also from these penalty terms. We then derive sufficient conditions under which the row-stochastic design converges faster even with a smaller spectral gap. Finally, by using a Rayleigh-quotient and Loewner-order eigenvalue comparison, we further obtain topology conditions that guarantee this advantage and yield practical topology-design guidelines.
- North America > United States > New Jersey > Hudson County > Hoboken (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Leisure & Entertainment (0.67)
- Health & Medicine (0.67)
- North America > United States (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Russia > North Caucasian Federal District > Republic of Karelia > Petrozavodsk (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
R1, R2, R4: Suggest more extensive analysis on Assumption 2 and the normalization step in Algorithm 1
We would like to thank the reviewers for their insightful feedback. In the following, we address their key concerns. Following reviewers' suggestions, we will add more thorough analysis in the final paper. Its advantages and applications are then limited. Mixup was introduced in VPU as a regularizer to solve the overfitting problem (Table 4 and Lines 100-105, 376-384).
Appendix T able of Contents
We prove the result for each of the three possible cases of the loss function. Lemma A.3, for ( x,y) X Y, we have p Using Lemma A.2, we have The ASO formulation above motivated the authors of [59] Note that when Θ is a full rank matrix, this decomposition is unique. Several personalized FL formulations, e.g., D.1 Client-Server Algorithm Alg. 2 is a detailed version of Alg. 1 ( FedEM), with local SGD used as local solver. Alg. 3 gives our general algorithm for federated surrogate optimization, from which Alg. 2 is derived.Algorithm 2: FedEM: Federated Expectation-MaximizationInput: Data S Alg. 5 gives our general fully decentralized algorithm for federated surrogate optimization, from As mentioned in Section 3.3, the convergence of decentralized optimization schemes requires certain In our paper, we consider the following general assumption. We provide below the rigorous statement of Theorem 3.3, which was informally presented in's iterates satisfy the following inequalities after a large enough number of In particular, we provide the assumptions under which Alg. 3 and Alg. 5 converge.